-
Notifications
You must be signed in to change notification settings - Fork 392
feat: Expr.var with ddof #6105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Expr.var with ddof #6105
Conversation
Greptile OverviewGreptile SummaryAdded variance aggregation function ( Key Changes:
Implementation Notes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Python API
participant DSL Layer
participant Local Plan
participant Core Ops
participant Stats Utils
User->>Python API: df.agg(col("x").var(ddof=1))
Python API->>DSL Layer: expr.var(ddof)
DSL Layer->>DSL Layer: Create AggExpr::Var(expr, ddof)
alt Single Partition
DSL Layer->>Core Ops: Series.var(groups=None, ddof)
Core Ops->>Core Ops: Cast to Float64
Core Ops->>Stats Utils: calculate_stats(array)
Stats Utils-->>Core Ops: Stats{sum, count, mean}
Core Ops->>Stats Utils: calculate_variance(stats, values, ddof)
Note over Stats Utils: variance = sum_of_squares / (n - ddof)
Stats Utils-->>Core Ops: variance value
Core Ops-->>User: Result
else Multiple Partitions
DSL Layer->>Local Plan: populate_aggregation_stages
Note over Local Plan: First Stage (per partition)
Local Plan->>Core Ops: Sum(X), Sum(X²), Count(X)
Core Ops-->>Local Plan: Partial results
Note over Local Plan: Second Stage (combine partitions)
Local Plan->>Local Plan: Sum(sums), Sum(sq_sums), Sum(counts)
Note over Local Plan: Final Stage
Local Plan->>Local Plan: pop_var = (sq_sum/n) - (sum/n)²
Local Plan->>Local Plan: sample_var = pop_var * n/(n-ddof)
Local Plan->>Local Plan: if n <= ddof then null else sample_var
Local Plan-->>User: Result
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 1 comment
| Tip: See Also | ||
| [`daft.functions.var`](https://docs.daft.ai/en/stable/api/functions/var/) | ||
| """ | ||
| from daft.functions import var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline import violates custom rule 430ffc3f: imports should be at the top of the file. While this pattern is used consistently across the codebase (e.g., stddev() method), it still conflicts with the stated guideline.
| from daft.functions import var | |
| return var(self, ddof) |
Move the import to the top of the file with other imports.
Context Used: Rule from dashboard - Import statements should be placed at the top of the file rather than inline within functions or met... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: daft/expressions/expressions.py
Line: 1001:1001
Comment:
Inline import violates custom rule 430ffc3f: imports should be at the top of the file. While this pattern is used consistently across the codebase (e.g., `stddev()` method), it still conflicts with the stated guideline.
```suggestion
return var(self, ddof)
```
Move the import to the top of the file with other imports.
**Context Used:** Rule from `dashboard` - Import statements should be placed at the top of the file rather than inline within functions or met... ([source](https://app.greptile.com/review/custom-context?memory=430ffc3f-245c-4a7f-8039-aba31c0ed558))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6105 +/- ##
==========================================
- Coverage 43.35% 43.31% -0.05%
==========================================
Files 917 918 +1
Lines 113004 113102 +98
==========================================
- Hits 48996 48987 -9
- Misses 64008 64115 +107
🚀 New features to boost your workflow:
|
|
@aaron-ang if you don't mind, in the future could you ask to work on the issue? I ask because someone else was already assigned to the issue. But otherwise, great work, will TAL! |
ok, I will choose to work on unassigned issues in the future. |
Changes Made
Related Issues
Closes #4705.